latent context
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Israel (0.04)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
- Health & Medicine (0.67)
- Energy (0.46)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Robots (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Data Science > Data Mining (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)
Tractable Optimality in Episodic Latent MABs
We consider a multi-armed bandit problem with $M$ latent contexts, where an agent interacts with the environment for an episode of $H$ time steps. Depending on the length of the episode, the learner may not be able to estimate accurately the latent context. The resulting partial observation of the environment makes the learning task significantly more challenging. Without any additional structural assumptions, existing techniques to tackle partially observed settings imply the decision maker can learn a near-optimal policy with $O(A)^H$ episodes, but do not promise more. In this work, we show that learning with {\em polynomial} samples in $A$ is possible. We achieve this by using techniques from experiment design. Then, through a method-of-moments approach, we design a procedure that provably learns a near-optimal policy with $O(\poly(A) + \poly(M,H)^{\min(M,H)})$ interactions. In practice, we show that we can formulate the moment-matching via maximum likelihood estimation. In our experiments, this significantly outperforms the worst-case guarantees, as well as existing practical methods.
- Information Technology > Data Science > Data Mining > Big Data (0.98)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.60)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.60)
- North America > United States > California (0.14)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- Leisure & Entertainment (0.67)
- Media (0.67)
- Education > Educational Setting (0.67)
RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation
We introduce the first sample-efficient algorithm for LMDPs without any additional distributional assumptions . Our result builds off a new perspective on the role of off-policy evaluation guarantees and coverage coefficients in LMDPs, a perspective, that has been overlooked in the context of exploration in partially observed environments.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Israel (0.04)
- Research Report > New Finding (0.66)
- Research Report > Experimental Study (0.46)
- Health & Medicine (0.67)
- Energy (0.46)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- North America > United States > California (0.14)
- North America > United States > Massachusetts (0.14)
- Leisure & Entertainment (0.46)
- Media (0.46)
- Information Technology (0.46)
- Education > Educational Setting (0.46)
A circuit for predicting hierarchical structure in-context in Large Language Models
Saanum, Tankred, Demircan, Can, Gershman, Samuel J., Schulz, Eric
Large Language Models (LLMs) excel at in-context learning, the ability to use information provided as context to improve prediction of future tokens. Induction heads have been argued to play a crucial role for in-context learning in Transformer Language Models. These attention heads make a token attend to successors of past occurrences of the same token in the input. This basic mechanism supports LLMs' ability to copy and predict repeating patterns. However, it is unclear if this same mechanism can support in-context learning of more complex repetitive patterns with hierarchical structure. Natural language is teeming with such cases: The article "the" in English usually prefaces multiple nouns in a text. When predicting which token succeeds a particular instance of "the", we need to integrate further contextual cues from the text to predict the correct noun. If induction heads naively attend to all past instances of successor tokens of "the" in a context-independent manner, they cannot support this level of contextual information integration. In this study, we design a synthetic in-context learning task, where tokens are repeated with hierarchical dependencies. Here, attending uniformly to all successor tokens is not sufficient to accurately predict future tokens. Evaluating a range of LLMs on these token sequences and natural language analogues, we find adaptive induction heads that support prediction by learning what to attend to in-context. Next, we investigate how induction heads themselves learn in-context. We find evidence that learning is supported by attention heads that uncover a set of latent contexts, determining the different token transition relationships. Overall, we not only show that LLMs have induction heads that learn, but offer a complete mechanistic account of how LLMs learn to predict higher-order repetitive patterns in-context.
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (4 more...)
- Government (0.46)
- Health & Medicine (0.46)